PCOS analysis

Agnes Lorenzen, Cecille Hobbs, Freja E. Klippmann, Julie Dalgaard Petersen & Mille Rask Sander

Introduction

Background

  • Polycystic ovary syndrome (PCOS) is a syndrome documented in women in their menustruating ages

  • Documented symptoms are often; period pains, irregular periods, ovary related problems and hormone imbalance

  • Patients with PCOS often have problems with pregnancy and potential complication with/in pregnancy

  • However, it is still not verified what the cause of PCOS is.

Aim

The aim of this study is to examine a data set (found on Kaggle) of patients with and without PCOS. The data set has been made in India and data comes from 10 different hospitals.

Data handling approach

  • Raw data:
    541 observation divided into 45 variables.
  • 01_load_data:
    Simply loads the data.
  • 02_clean_data:
    • Fixing random cells and replacing them with NA
    • Rename & factorizing columns
    • Split dataframe into body and blood measurements
    • Removed empty column.
  • 03_augment:
    • Unit changes ( inch to cm)

    • Rounding & grouping BMI

    • Change Blood type and cycles from numeric values to characters

    • Create new column for cycle/ pregnancy stage

    • Merging data frame into one file

# Rounding of BMI and dividing into categories
body_measurements <- body_measurements |>
  mutate(BMI = round(BMI, 1)) |> 
  mutate(BMI_class = case_when(
    BMI < 18.5 ~ "Underweight",
    BMI <= 18.5 | BMI < 25 ~ "Normal weight",
    BMI <= 25 | BMI < 30 ~ "Overweight",
    BMI >= 30 ~ "Obesity")) |>
  mutate(BMI_class = factor(BMI_class,
                            levels =  c("Underweight", 
                                        "Normal weight",
                                        "Overweight", 
                                        "Obesity"))) |>
  relocate(BMI_class, .after = BMI)

Data handling approach

02 Clean data 03 Augment data
  • Fixing random cells and replacing them with NA
  • Rename & factorizing columns
  • Split data frame into body and blood measurements
  • Removed empty column
  • Unit changes ( inch to cm)
  • Rounding and grouping BMI
  • Change Blood type and cycles from numeric values to characters
  • Create new column for cycle/ pregnancy stage
  • Merging data frame into one file

Descriptive analysis of data

Left Column

# A tibble: 2 × 1
  `PCOS dimensions`
              <int>
1               541
2                44
# A tibble: 2 × 2
  PCOS_diagnosis     n
  <chr>          <int>
1 No               364
2 Yes              177

Right Column

Analysis 1

hh

hh

Analysis 2

her

PCA of blood measurements

PCA of body measurements

her

Discussion

her

Conclusion

  • no significance